RxR Questionnaire - 2023 – Lorenzo Mattioli

In 2023, a team of researchers realised a social survey for the Umbrian association una Regione per Restare - RxR. The research focused mainly on two questions:

What compels so many people to leave their region?
Why do some still decide to stay?

As this was the first research project promoted by the association, the sample is patchy and excessively small. Since the survey was administered through snow-balling, there is also an evident self-selection problem (too many students, too many individuals aged 20-25 due to the survey being distributed mainly through universities). These problems have been discussed at length internally and will all be taken into account in future projects. In the meantime, the data was used to conduct a merely descriptive analysis, in order to obtain at list a few hints on the next steps to be taken. A good portion of this exact work was carried out in order to explain the issues with the data collection process itself to a non-technical audience.

About a year after the beginning of the survey’s distribution, I was involved as a data analyst, and contributed to the writing of the final report. Below is a selection of my contributions, both in terms of internal and external communication.

Since the whole work was written in Italian, the graphs and charts are not translated. The source code and data is available at the link on the top right, which leads to Lucia Temperini and I’s GitHub repository. Although code boxes were included in the post for reference, the final graphics were post-produced in Pixelmator. A selection of these are included in the gallery right below, before the actual post.

Read the print-ready report

Gallery

Coverage

Show code

## Domicilio - cleaning
Abitare$dom <- as.character(Abitare$dom)
Abitare$dom <- ifelse(Abitare$dom == 'Stesso della residenza', Abitare$res, Abitare$dom)

## Province
prov <- geoCod |> 
  select(`Denominazione (Italiana e straniera)`, `Unità territoriale sovracomunale`) |> 
  rename(denom = `Denominazione (Italiana e straniera)`,
         prov = `Unità territoriale sovracomunale`)

Abitare <- left_join(Abitare, prov, by = join_by(dom == denom))
Abitare <- Abitare |> 
  relocate(prov, .after = dom) |> 
  rename(prov_dom = prov)

## Select and merge quest data
df <- left_join(Lavorare |> 
                  select(id, occ, gen),
                Abitare |> 
                  select(id, dom, eta))
### factor età
df$eta <- as.numeric(df$eta)
df <- df |> 
  mutate(class_eta = case_match(eta,
                    c(15:19) ~ '15-20',
                    c(20:24) ~ '20-25',
                    c(25:29) ~ '25-30',
                    c(30:34) ~ '30-35',
                    c(35:39) ~ '35-40',
                    c(40:45) ~ '40-45'
  ))

### occ cleaning
df <- df |> 
  mutate(occ = gsub((' (inclusi contratti a nero, precari,  di ricerca, stage, servizio civile)'), '', occ, fixed = T),
         occ = gsub((' (inclusi contratti a nero, precari, di ricerca, stage, servizio civile)'), '', occ, fixed = T))

## dom frequency table
domdf <- df |> 
  group_by(dom) |> 
  count()
### geospatial data merge
domdf <- left_join(domdf,
                sf |>
                  select(COMUNE, geometry),
                by = join_by(dom == COMUNE))

The main problem with the questionnaire was its coverage. In discussing the problem with the rest of our organisation, mapping it out was the best way to explain our dissatisfaction with the results. The tooltip reveals the number of responses in each municipality. The extremely low numbers would be a problem in and of themselves, but the extreme concentration in the city of Perugia (where the University is located) renders the sample impossible to use for statistical inference.

The gray areas are municipalities (comuni) that we could not reach at all.

Show code

# Dataviz ----------------------------------------------------------------------
## Mappa copertura geo

ggdom <- domdf |> 
  ggplot() +
  geom_sf(data = umbriasf, aes(geometry = geometry), colour = 'black', fill = 'gray98') +
  geom_sf_interactive(aes(geometry = geometry, fill = n, data_id = dom, tooltip = n)) +
  geom_sf_text(data = ~. |> filter(dom != 'Perugia'),
               aes(geometry = geometry, label = dom), colour = 'black', size = 3) +
  geom_sf_text(data = ~. |> filter(dom == 'Perugia'),
               aes(geometry = geometry, label = dom), colour = 'white', size = 3) +
  scale_fill_viridis_c(option = 'inferno', direction = -1) +
  labs(title = 'Questionnaire\'s geographic coverage') +
  theme_void() +
  theme(legend.position = 'none',
        plot.title = element_text(family = 'Helvetica', hjust = .5, size = 20))


### Interactive

girafe(ggobj = ggdom,
       width_svg = 8,
       height_svg = 9,
       options = list(
         opts_hover(css = ''),
         opts_hover_inv(css = 'opacity:0.3;'),
         opts_tooltip(css = "background-color:white;
                      color:black;
                      font-family:Helvetica;
                      font-style:empty;
                      padding:8px;
                      border-radius:10px;",
                      use_cursor_pos = T),
         opts_toolbar(position = 'bottomright')))

The extremely unequal socio-demographical composition of the sample did not reassure us in any way. The choice of snow-balling as a distribution method introduced significant self-selection problems, which lead to the situation shown in the figure

Show code

### Genere/età/occ
df |> 
  mutate(gen = case_match(gen,
                          'Donna' ~ 'Female',
                          'Uomo' ~ 'Male',
                          'Non Binario' ~ 'Non-binary'
                          )) |> 
  ggplot(aes(x = class_eta, fill = occ)) +
  geom_bar() +
  scale_fill_met_d('Degas', direction = -1) +
  facet_wrap(vars(gen)) +
  theme_minimal() +
  theme(legend.title = element_blank(),
        legend.position = 'bottom',
        axis.title = element_blank(),
        strip.text = element_text(size = 15),
        plot.title = element_text(hjust = .5, size = 18))

Living conditions

Building on the questionnaire’s questions, an index of satisfaction with one’s living conditions was built. All values are expressed in percentage.

Show code

## Data cleaning/wrangling -----------------------------------------------------
### Domicilio - cleaning
Abitare$dom <- as.character(Abitare$dom)
Abitare$dom <- ifelse(Abitare$dom == 'Stesso della residenza', Abitare$res, Abitare$dom)

## Index recode
abRecode <- function(x) {case_match(paste(x),
        'Per nulla' ~ 1,
        'Poco' ~ 2,
        'Abbastanza' ~ 3,
        'Molto' ~ 4
      )}

Abitare <- Abitare |> 
    mutate(across(c(abQual, abCost, abPriv, abVic, abColl),
                  abRecode
                  )
           )
rm(abRecode)

### Additive index def
Abitare <- Abitare |> 
  mutate(abInd = rowSums(across(c(abQual, abCost, abPriv, abVic, abColl)))) |> 
  mutate(abInd = (abInd-5)/15 * 100) |> 
  mutate(abInd = round(abInd, 2)) |> 
  relocate(abInd, .after = abColl)

Show code

# Dataviz ----------------------------------------------------------------------
## Index by urban zone
abPlotZon <- Abitare |> 
  filter(prov_dom == 'Terni' | prov_dom == 'Perugia') |> 
  group_by(zon) |>
  summarise(meanzon = mean(abInd)) |> 
  mutate(zon = case_match(zon,
                          'Periferia' ~ 'Suburbs',
                          'Centro Cittadino' ~ 'City centre',
                          'Area Rurale' ~ 'Rural area')) |> 
  mutate(ovmean = mean(meanzon),
         flag = ifelse(meanzon > ovmean, T, F),
         zon = factor(zon,
                           levels = zon[order(meanzon)]))

ggPlotZon <- abPlotZon |> 
  ggplot(aes(x = zon, y = meanzon, colour = flag, data_id = zon, tooltip = round(meanzon, 2))) +
  geom_point_interactive(size = 6) +
  geom_segment_interactive(aes(y = ovmean, yend = meanzon, x = zon, xend = zon)) +
  geom_point(size = 4, colour = 'white') +
  scale_y_continuous(n.breaks = 4) +
  geom_hline(yintercept = abPlotZon$ovmean[1], colour = 'gray70', size = 0.3) +
  scale_color_met_d('Degas') +
  coord_flip() +
  labs(title = 'Living condition satisfaction index',
       subtitle = 'Decomposition by area') + 
  theme_minimal() +
  theme(axis.title = element_blank(),
        legend.position = 'none',
        plot.title = element_text(hjust = .5, size = 20),
        plot.subtitle = element_text(hjust = .5, size = 15),
        axis.text.y = element_text(size = 11))


### Interactive graph
girafe(ggobj = ggPlotZon,
       width_svg = 8,
       options = list(
         opts_hover(css = ''), ## CSS code of line we're hovering over
         opts_hover_inv(css = "opacity:0.3;"), ## CSS code of all other lines
         opts_tooltip(css = "background-color:white;
                      color:black;
                      font-family:Helvetica;
                      font-style:empty;
                      padding:8px;
                      border-radius:10px;",
                      use_cursor_pos = T),
         opts_toolbar(position = 'bottomright')))

Should I stay or should I go?

The following section tries to answer the main question directly: what brings people to leave their home? Why do some still decide to stay?

Relationship to Umbria

The first step of the analysis was purely descriptive: how big is the fraction of our sample which left, or would like to? An infographic-style waffle chart was the best choice to convey meaning keeping sense of scale.

Show code

## Rapporto con la regione -----------------------------------------------------

Restare |> 
  group_by(rapp) |> 
  count() |> 
  mutate(rapp = case_match(rapp,
                           'Vorrei restare nel posto in cui vivo' ~
                             'I\'d like to stay where I am',
                           'Sarei contento di vivere e lavorare altrove' ~
                             'I\'d like to live and work somewhere else',
                           'Vorrei restare ma non posso' ~
                             'I\'d like to stay, but I can\'t',
                           'Vorrei partire ma non posso' ~
                             'I\'d like to leave, but I can\'t'),
         rapp = factor(rapp, levels = c(
           'I\'d like to stay where I am',
           'I\'d like to stay, but I can\'t',
           'I\'d like to leave, but I can\'t',
           'I\'d like to live and work somewhere else'
         ))) |> 
  waffle(size = 1,
         flip = T,
         reverse = T,
         legend_pos = 'right')

Reasons for staying

This last section aimed at finding the exact reasons why people either stay or go. Radar plots were the best way to convey the relative importance of each individual reason.

Show code

## Motivi per restare ----------------------------------------------------------

### graphics df
#### counting each column
rest <- tibble(.rows = 4, choice = c('Per nulla',
                                     'Poco',
                                     'Abbastanza',
                                     'Molto'))
for (i in 1:10) {
vec <- Restare[startsWith(names(Restare),"rest")] |> 
    group_by(Restare[startsWith(names(Restare),"rest")][i]) |>
    drop_na() |> 
    count(name = paste(names(Restare[startsWith(names(Restare),"rest")][i]), '_n')) |> 
    rename('choice' = names(Restare[startsWith(names(Restare),"rest")][i]))

rest <- full_join(rest, vec |> mutate(choice = choice))

}
rm(i)
#### data wrangling
rest2 <- data.frame(t(rest[-1])) # swapping columns-rows
colnames(rest2) <- rest$choice

rest <- rownames_to_column(rest2) |>
  mutate(rowname = gsub('_n', '', rowname)) |> # column 
  rename(choice = rowname)

rm(rest2)

rest <- rest |> 
  mutate(index = round(((Abbastanza + Molto)/89)*100, 2)) # % di abbastanza + molto importante

### graphics
labels<-data.frame(
  y = c(25,50,75,100),
  x = rep(0.25,4)
)
rest |> 
  filter(choice != 'restFort ') |> 
  mutate(choice = case_match(choice,
                             'restLeg ' ~ 'Legame/impegno per la comunità',
                             'restSoc ' ~ 'Contatti sociali e umani più gratificanti',
                             'restNat ' ~ 'Contatto con la natura',
                             'restQual '    ~ 'Qualità e stile di vita',
                             'restOpp ' ~ 'Opportunità anche nel restare',
                             'restImp ' ~ 'Idea imprenditoriale',
                             'restFam ' ~ 'Esigenze personali/familiari',
                             'restCost '    ~ 'Costo della vita più basso',
                             'restAmb ' ~ 'Scarsa importanza alla carriera'
                             )) |> 
  ggplot(aes(x = choice, y = index, fill = choice)) +
  geom_col() +
  coord_polar() +
  scale_y_continuous(limits = c(0, 85)) +
  labs(title = 'Motivi per restare') +
  scale_fill_manual(values = met.brewer('Tiepolo', 9)) +
  theme_void() +
  theme(axis.title = element_blank(),
        legend.position = 'right',
        legend.title = element_blank(),
        plot.title = element_text(size = 20, hjust = .5))
## Motivi per lasciare ---------------------------------------------------------

### graphics df
#### counting each column
lasc <- tibble(.rows = 4, choice = c('Abbastanza', 'Molto', 'Per nulla', 'Poco'))
for (i in 1:10) {
  vec <- Restare[startsWith(names(Restare),"lasc")] |> 
    group_by(Restare[startsWith(names(Restare),"lasc")][i]) |>
    drop_na() |> 
    count(name = paste(names(Restare[startsWith(names(Restare),"lasc")][i]), '_n')) |> 
    rename('choice' = names(Restare[startsWith(names(Restare),"lasc")][i]))
  
  lasc <- full_join(lasc, vec)
  
}
rm(i)
#### data wrangling
lasc2 <- data.frame(t(lasc[-1])) # swapping columns-rows
colnames(lasc2) <-  lasc$choice

lasc <- rownames_to_column(lasc2) |>
  mutate(rowname = gsub('_n', '', rowname)) |> # column 
  rename(choice = rowname)

rm(lasc2)

lasc <- lasc |> 
  mutate(index = round(((Abbastanza + Molto)/166)*100, 2)) # % di abbastanza + molto importante

### graphics
labels<-data.frame(
  y = c(25,50,75,100),
  x = rep(0.25,4)
)
lasc |> 
  mutate(choice = case_match(choice,
                             'lascEsp ' ~ 'Ampliare i propri orizzonti',
                             'lascOpp ' ~ 'Formazione/offerte di lavoro',
                             'lascImp ' ~ 'Idea imprenditoriale',
                             'lascEst ' ~ 'Bellezza estetica delle città',
                             'lascSoc ' ~ 'Relazioni sociali',
                             'lascFam ' ~ 'Realizzazione familiare',
                             'lascServ '    ~ 'Offerta di servizi',
                             'lascRit ' ~ 'Realizzarsi per poi tornare',
                             'lascTent '    ~ 'Tentare a realizzarsi',
                             'lascCult '    ~ 'Vita culturale più intensa'
  )) |> 
  ggplot(aes(x = choice, y = index, fill = choice)) +
  geom_col() +
  coord_polar() +
  scale_y_continuous(limits = c(0, 95)) +
  labs(title = 'Motivi per andare') +
  scale_fill_manual(values = met.brewer('Tiepolo', 10)) +
  theme_void() +
  theme(axis.title = element_blank(),
        legend.position = 'right',
        legend.title = element_blank(),
        plot.title = element_text(size = 20, hjust = .5))